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Abstract. On August 15th, 2004, Venezuelans had the opportunity 
to vote in a Presidential Recall Referendum to decide whether or not 
President Hugo Chavez should be removed from office. The process was 
largely computerized using a touch-screen system. In general the ballots 
were not manually counted. The significance of the high linear correla- 
tion (0.99) between the number of requesting signatures for the recall 
petition and the number of opposition votes in computerized centers is 
analyzed. The same-day audit was found to be not only ineffective but 
a source of suspicion. Official results were compared with the 1998 pres- 
idential election and other electoral events and distortions were found. 
Key words and phrases: Referendum, election, voting machines, touch 
screen, ballot, correlation, uncertainty, audit. 
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1. INTRODUCTION 

A referendum to recall President Hugo Chavez 
was carried out in Venezuela on August 15, 2004. 
The president was not recalled since the official "no" 
votes (votes in favor of the president) exceeded the 
official "si" votes (votes in favor of removing the 
president from his post). The Organization of Amer- 
ican States (OAS) and the Carter Center observed 
the proceedings and carried out some analyses of the 
voting data. They concluded that no tampering was 
apparent and that official results were accurate [3]. 

In this manuscript, we carry out a more in-depth 
analysis of both the voting data and the data that 
arose from two audits carried out after the recall ref- 
erendum. We focus on the association between the 
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proportion of voters who had signed a petition to 
carry out the referendum and the actual proportion 
of "si" votes recorded at each voting center and com- 
pare what was observed relative to what might have 
been expected under some reasonable assumptions 
about voter behavior. We also highlight the differ- 
ences between what was observed and what might 
have been expected relative to the type of voting 
center (manual or computerized) and note that offi- 
cial results obtained from computerized voting cen- 
ters were surprising. 

We conclude that results from our analysis of the 
voting and auditing data suggest that official results 
may not be as accurate as the OAS/Carter Center 
report suggest. The objective of this article is to ar- 
gue that a second look at the results of the Presiden- 
tial Recall Referendum of 2004 in Venezuela might 
be justified. 

2. THE ELECTORAL PROCESS 
IN VENEZUELA 

Electoral events in Venezuela are organized by 
the "Consejo Nacional Electoral" 1 (CNE). On De- 
cember 6, 1998 the current president won the elec- 



1 Before the new constitution it was known as the "Consejo 
Supremo Electoral" (CSE); see http://www.cne.gov.ve. 
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tions with 3,673,685 (57.79%) votes versus 2,863,619 
(42.21%) votes for his adversaries. The total num- 
ber of voters in the electoral registry (REP) at that 
time was 11,001,913. 

In 1999 a new constitution was enacted which al- 
lows citizens to request a recall referendum (RR) 
to decide whether the president should continue in 
office. This referendum can only be activated after 
half of the period for which the president has been 
elected has transpired. In order to activate the ref- 
erendum, a petition signed by at least 20% of the 
voters registered in the REP has to be submitted to 
the CNE. It is also possible to request a consultative 
nonbinding referendum with the signatures of 10% 
of the voters registered in the REP. 

On January 3, 2000 a new CNE was appointed but 
it failed to organize elections as scheduled. There- 
fore, on June 5 of 2000, yet another CNE was ap- 
pointed. On July 30, 2000 the president was re- 
elected for a 6-year period with 3,757,773 (59.76%) 
votes versus 2,530,805 (40.24%) for his adversaries. 
The REP had 11,701,521 registered voters at that 
time. 

In 2002 signatures were collected requesting a con- 
sultative referendum which was activated in the mid- 
dle of a general national strike. The Supreme Court 
disabled the CNE, therefore this consultative ref- 
erendum never took place. Citizens then collected 
signatures again, this time for a recall referendum. 
This was the legal instrument which the government 
and the opposition represented by the Coordinadora 
Democrdtica agreed to use, with the OAS and the 
Carter Center acting as guarantors [1]. This agree- 
ment ended the strike. 

In 2003, the National Assembly was unable to 
agree on a new CNE, so the Supreme Court ap- 
pointed a new temporary CNE on August 26, 2003, 
even though this procedure was not contemplated 
in the constitution. The new CNE rejected the sig- 
natures of the petition for a referendum saying that 
they had been collected before half of the presiden- 
tial period had transpired. 

On November 28, 2003 signatures were collected 
once again, this time under the supervision of the 
CNE. On May 28, 2004, a significant fraction of the 
signatures had to be reverified by the CNE. Enough 
signatures were valid so, on August 15, 2004 the 
Presidential Recall Referendum finally took place. 

3. VOTE COLLECTION STRUCTURE 

Venezuela is politically organized into states, coun- 
ties (municipalities) and townships (parishes) . Each 
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Fig. 1. Venezuelan vote collection structure. 

county has one or more voting centers. There can 
be several voting tables (voting stations) per center, 
and each one has one or more electoral notebooks. 
In computerized centers, one voting machine is as- 
signed to each electoral notebook. One ballot box is 
assigned to each table. Therefore, the ballots from 
multiple machines may be combined in a single bal- 
lot box. See Figure 1 for the detailed layout of the 
system. 

Each voting center has a unique identifying code 
which makes it possible to compare electoral results 
on a center by center basis. 

Although the number of manual centers is large, 
the number of people registered in those centers is 
much smaller than those registered in computerized 
centers. These distributions are shown in the his- 
tograms of Figure 2. 

4. THE VOTING PROCEDURE 

There were only two ways to vote 2 : si (yes) or no. 
In order for the president to step down, the number 
of si votes had to be greater than 3,757,773 and 
greater than the number of no votes. 

Touch-screen voting machines were used for the 
first time in Venezuela for the Referendum. These 
machines also gave the voter a paper ballot to be 
deposited in a box. The boxes were never opened 
except for some of those selected for auditing. The 
results were sent electronically from the voting ma- 
chines to the CNE servers using TCP/IP connec- 
tions over telephone lines, after which the voting 
machines printed out the results, as well as a dupli- 
cate set of all the paper ballots in a continuous un- 



2 In manual voting centers it was also possible to cast a null 
vote. 
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cut format. The voting centers also had a continuous 
satellite TCP/IP connection which was to be used 
only by fingerprint machines which were supposed 
to prevent anyone from voting twice, even in differ- 
ent voting centers. 

In order to give the citizens confidence in the re- 
sults, two audits were made. The first one was done 
on the same day as the Referendum (hot audit). The 
second one was carried out three days later (cold au- 
dit). 

The official results were 3,989,008 (40,64%) si votes 
versus 5,800,629 (59,10%) no votes, with 14,037,900 
registered voters in the REP. A large fraction of the 
votes (87.1%) were cast at computerized voting cen- 
ters. 

The whole electoral process and the audits were 
supervised and endorsed by the OAS and the Carter 
Center. They found no evidence of alterations or 
tampering in the results in their final report. 

5. THE SIGNATURES 
5.1 Introduction 

In order to activate the Referendum, on November 
28, 2003, signatures and fingerprints were collected 
in a four-day event organized by the CNE, with wit- 
nesses from all political parties. Special forms, with 
serial numbers were supplied by the CNE to all po- 
litical parties. There were 2,676 signature collection 
centers (SCCs), all of them in Venezuela. No signa- 
ture collection was allowed outside Venezuela. 

There were two kinds of forms: types A and B. 
Type A forms were used in the SCCs. Type B forms 
were also assigned to SCCs, but they were meant to 
be used for house to house signature-collecting (un- 
der pro-government witness supervision). There were 
618,800 type A forms and 98,286 type B forms. Each 
form had a maximum capacity of 10 signatures. 

The number of signatures required to activate the 
Referendum was 20% of the REP used to elect the 
president, that is, 0.2 x 11,701,521 = 2,340,305 sig- 
natures. The law required the publication in a news- 
paper of a list of ID numbers of all the people who 
signed the petition. 



The CNE divided the signatures into three cat- 
egories: valid, invalid and questionable. An impor- 
tant number of questionable signatures had to be 
collected again in order to reach the required mini- 
mum number of signatures. 

Opposition groups claimed to have submitted 
3,467,051 signatures to the CNE. Within the CNE, 
19,842 signatures were lost. 3 An additional indeter- 
minate number of signatures were lost before reach- 
ing the CNE. 

It is reasonable to assume that most of those who 
signed requesting the Referendum intended to vote 
si in favor of the recall. 4 However, it is also possible 
that some signers voted no. This might have been 
the case for government supporters who signed the 
petition because they believed they could use the 
referendum to help solve the high level of political 
confrontation in the country. There were also signers 
who changed their political preferences between the 
time of the signature collection and the vote. 

In the following sections, the official results of 
the referendum will be compared with the signa- 
tures collected. This will reveal some important facts 
about these results. 

5.2 Si Vote Uncertainty With Regard to 
Signatures 

Let k be the relative number of si votes, as defined 
in equation (1): 

. . si votes 

(1) k — . 

signatures 

Also, let s be the relative number of signatures in 
a voting center, as defined in equation (2): 

signatures 

s — 

, si votes + no votes + null votes 

<2) ^natures 
total votes ' 



3 See http : / /buscador . eluniversal . com/ 2004/ 05/ 09/ apo_ 
art_09152D.shtml 

4 The OAS and the Carter Certer concur with this state- 
ment. See [2], Section 5, second paragraph. 
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Fig. 3. Relationship between k and s for computerized and manual centers. The shadowed area contains the mathematically 
impossible values of k. The maximum k value is 1/s. The hollow dots represent voting centers located in consular offices. 



For each value of s, there is a maximum possible k 
which is just 1/s as shown in equation (3): 



(3) 



max(si votes) total votes 
signatures 

total votes 



signatures 
1 



s • total votes s 

In voting centers with a large value of s, we ex- 
pected a value of k around 1. This is because each 
signature has a high probability of resulting in a si 
vote, and at the same time &; max gets close to 1. 

For example, in a voting center with 1,000 total 
votes and 900 signatures, the number of expected si 
votes is between 900 and 1,000. Here s = 900/1,000 = 
0.9 and &; max = 1/0.9 = 1.11. Therefore, the uncer- 
tainty in the value of k is very small, as it should be 
between 5 1 and 1.11. 

The situation is completely different in voting cen- 
ters with a small value of s. Notice that there is an 
essential singularity in k at s = as shown in equa- 
tion (4): 

si votes/total votes 



(4) 



k- 



This singularity can produce very high values of k 
in the neighborhood of 5 = 0. Hence, the level of 
uncertainty in k becomes very large. 

For example, in a voting center with 1,000 total 
votes and 2 signatures, the number of expected si 
votes is between 2 and 1,000. Here s = 2/1,000 = 



5 The value of k could be lower than 1 if, for any reason, 
the number of votes was low (e.g., high abstention). 



0.002 and fc max = 1/0.002 = 500. Therefore, the un- 
certainty in the value of k is extremely large, as it 
should be between 1 and 500. 

The reasons for the uncertainty in k just discussed 
are purely mathematical. In practical terms, high 
values of k in centers with a small s were due to the 
following facts: 

• There were only 2,676 SCCs compared to 8,394 
voting centers. Therefore, voters living far from 
a SCC could not sign the petition, even if they 
wanted to. This was the case in mostly rural areas. 

• There were many people who did not sign the pe- 
tition because of their fear of retribution from the 
government. On the other hand, voting was secret. 

• There were si votes from people who could not 
sign because they were not in the REP or were 
outside the country at the time of signature col- 
lection. 

• Some SCCs ran out of forms. Not everyone was 
able to go to a more distant SCC to sign. 

• An undetermined number of signatures were lost. 

• There were si votes from people who just didn't 
bother to sign the petition. 

Notice that all these issues with the signatures did 
not affect all voting centers equally. Centers with 
a small value of s are more likely to have been af- 
fected by these issues than centers with a high value 
of s. 

A plot of k versus s is shown in Figure 3. No- 
tice that when s is not large, all the computerized 
centers are very far away from &; max , clearly contra- 
dicting the expected nonlinear behavior with respect 
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Fig. 4. Relationship between k and total number of votes for computerized and manual centers in the same size range. The 
hollow dots represent voting centers located in consular offices. 



to s. On the other hand, the manual center results 
are effectively distributed in the allowed range re- 
gardless of the relative number of signatures. 
In summary: 



It is expected that fc's from voting centers 
with a small value of s will be much more 
variable than those with large values of s. 



5.2.1 Behavior ofk with regard to the size and cha- 
racteristics of the voting centers. Although the man- 
ual centers tend to have fewer voters than the com- 
puterized centers, this does not seem to be the only 



reason for the different behavior in k. This can be 
seen in Figure 4. 

There were many small computerized voting cen- 
ters in rural areas. Many used mobile phone lines to 
connect the voting machines to the CNE servers to 
transmit the results because of the lack of regular 
phone lines in these remote areas. 

There were 586 townships which included both 
manual and computerized voting centers. These 
mixed townships had 5,449 voting centers (2,538 
manuals and 2,911 computerized). Notice in Fig- 
ure 5 (top) that the behavior of k in these mixed 
townships, is very different for manual and com- 
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Fig. 5. Relationship between k and s for computerized (right) and manual centers (left) for mixed townships (top) and 
hamlets (bottom). 
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Fig. 6. Manual centers have a correlation of 0.607 with respect to the signatures while computerized centers have a correlation 
of 0.989. A correlation of 1 would look like a straight line. 



puterized centers. Appendix B shows an example 
of such a mixed township. 

Another interesting comparison is related to ham- 
lets ( "caserios" ) . A total of 2,162 voting centers in 
hamlets were identified 6 (1,852 manual and 310 com- 
puterized). 

Due to the reasons mentioned in Section 5.2, many 
hamlets must have been far away from a SCC. For 
this reason voting centers located in hamlets should 
include large values of A:. In Figure 5 (bottom) it 
can be seen that these large values are found only 
in manual voting centers. 

Furthermore, Figure 5 shows that the behavior of 
the k values in computerized voting centers in ham- 
lets looks more like that of the rest of the computer- 
ized centers than the behavior of the 1,852 manual 
centers located in the rest of the hamlets. 

5.3 Correlations Between Si Votes and 
Requesting Signatures 

Let r sf be the correlation of si votes with respect 
to the number of signatures. 

The Carter Center and the OAS said the following 
in one of their reports [2]: 

A very high correlation between the num- 
ber of signers and the number of si votes 



The official list of voting centers was searched for the word 
"CASERIO" in the address field. These produced the list of 
2,162 voting centers. 



per center in the universe of automated 
voting machines has been found — a cor- 
relation coefficient of 0.988. This means 
that in voting centers where a high signer 
turnout was obtained, a high si vote also 
was obtained. 7 

What this report does not mention is that for 
manual voting centers, the correlation is 0.607, 
a much lower value. This difference can be visual- 
ized in Figure 6. Notice that a straight line from 
the origin to each of the points has a slope of k. 
The high correlation value for computerized centers 
translates into similar k values (or slopes) for most 
centers. 

In this case, the high correlation in computerized 
voting centers also implies that in voting centers 
where a low signer turnout was obtained, a low si 
vote was also obtained. This can be seen in the ori- 
gin of Figure 6(b). Hence, when the number of sig- 
natures tends to zero, the number of si votes also 
tends to zero. But, as observed in Figure 6(a), man- 
ual centers do not exhibit the same behavior. 

The behavior found in computerized centers seems 
unexpected because the relationship between signa- 
tures and si votes should not be linear, especially 
when the number of signatures is small. As explained 
in Section 5.2, you could expect a large number 



7 This correlation value was reproduced with a difference of 
just 0.001 which is negligible. 
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Table 1 

Correlations of si votes with respect to the relative 
number of signatures s per center, for manual and 
computerized voting centers 





s < 


0.5 


8 > 0.5 


All 






# 




# 




# 


Manual 


0.613 


3,375 


0.947 


221 


0.607 


3,596 


Computerized 


0.983 


3,943 


0.994 


645 


0.989 


4,588 


Both 


0.953 


7,318 


0.996 


866 


0.973 


8,184 



of si votes if there were a large number of signa- 
tures, but as the number of signatures per center 
decreases, the level of uncertainty in the number of 
si votes with respect to the number of signatures 
increases. 

In Table 1 the correlations are calculated for cen- 
ters where signers were a minority (s < 0.5) and 
a majority (s > 0.5). Notice that as expected, the 
correlation for manual centers is much higher when 
there are many signatures (0.947) than when there 
are fewer signatures (0.613). This is the expected 
behavior because when you have many signatures 
the uncertainty of k is small, and the number of 
si votes is equal to k x signatures so the uncer- 
tainty in the absolute number of si votes is also 
small. 

In the case of the 645 computerized voting centers 
where s > 0.5 the correlation was 0.994 which is very 
high. It stands out that in the computerized voting 
centers where signers were a minority, the correla- 
tion is still very high at 0.983. Furthermore, there is 
not a single computerized voting center with many 
more si votes than signatures as seen in Figure 6(b). 
In other words, for some reason, computerized cen- 
ters do not seem to show the expected nonlinear 
relationship between signatures and si votes. 
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5.4 Correlation Plot 

In order to further investigate the change of un- 
certainty as the relative number of signatures varies, 
a technique similar to a moving average is used. 
The difference is that instead of calculating an aver- 
age, a correlation is calculated. A window size of 150 
voting centers was used. This is the same number of 
centers that were audited. 

In order to do this, the first step is to sort the 
voting centers, computerized and manual, according 
to their s value. Then r sf is calculated for centers in 
positions 1 to 150. Subsequently r sf is calculated for 
centers in positions 2 to 151, and so on. The result 
is shown in Figure 7. 

For manual centers, there are large variations in 
the correlation in the left side of Figure 7. This 
is the result of outliers coming in and out of the 
150 centers calculation window. As the outliers are 
real official data, they should not be dropped. In- 
stead, logarithms can be used for both the num- 
ber of votes and signatures. This way the effect of 
the outlier is taken into account in a better way. 
The result of using this technique is shown in Fig- 
ure 8. 

Regardless of whether correlations are calculated 
on a linear scale (Figure 7) or on a logarithmic scale 
(Figure 8), the important fact to point out is that 
the reduction in correlation as s decreases is large 
for manual centers, whereas it is negligible for com- 
puterized centers. 

6. THE HYPOTHESIS 

What has been presented thus far should be enough 
to cast a serious shadow of doubt regarding the of- 
ficial results in the computerized centers. Based on 

I Computerized Centers 
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Fig. 7. Correlations plot using a window of 150 voting centers. 
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Fig. 8. Correlations plot (logarithmic scale) using a 150 voting centers window. 



this, it is natural to consider the following hypothe- 
sis 8 : 

Hypothesis. In computerized centers, official 
results were forced to follow a linear relationship 
with respect to the number of signatures. 

If this hypothesis were true, because of the rea- 
sons explained in Section 5.2, the results would be 
distorted with respect to reality, especially in voting 
centers with a small s value. 

In places where the signatures did not correctly 
capture the political intention of the people, two 
things would happen: 

1. The number of si votes, according to the official 
CNE results, would tend to be much lower than 
the number of real si votes. 

2. The official results of those computerized voting 
centers would be a poor representation of the po- 
litical intentions in the area. 

In the next section the results of the referendum 
will be compared to those of the 1998 presidential 
election in order to find out if these distortions are 
indeed present. 

7. 1998 ELECTION COMPARISON 

Despite the fact that more than 5 years separate 
the 1998 presidential election and the Referendum, 
and that the Referendum was not an election, there 
are reasons that make the comparison of both events 
interesting: 



8 The mechanics of how votes could have been altered, and 
by whom is not studied here. However, the fact that the ma- 
chines established a TCP/IP connection to the CNE, discon- 
nected and only then printed the results, opens many security 
holes. These issues are beyond the scope of this article. 



• In both cases the future of the presidency was at 
stake. 

• In Venezuela, since 1958 a new president had been 
elected every 5 years. Immediate reelection was 
prohibited by the 1961 constitution. Between the 
1998 election and the 2004 Referendum, 5 years 
and 8 months had gone by. On the other hand, the 
president had repeatedly claimed that he would 
stay in office at least until the year 2021. 

• Both events were open for all Venezuelan citizens 
in the electoral registry. 

• Both cases involved a very polarized electorate. In 
1998 the top two candidates obtained 96.17% of 
the valid votes. The other 3.83% of the votes went 
to candidates who were also politically opposed to 
the winning candidate. 

• There were 8,431 voting centers in 1998 and 8,394 
voting centers for the Referendum. The events 
had 8,328 voting centers in common. 

• Comparing the 1998 election and the Referendum 
results gives an estimate of whether the popular- 
ity of the president increased or decreased in the 
vicinity of each voting center. 

Additionally, the 1998 electoral results are used 
for comparison because at that time, the CNE was 
not under the influence of the current government. 

7.1 Correlations Between % of Opposition 
Votes in 1998 and in RR 

By comparing the electoral results (percentage of 
opposition) on a township by township basis, it was 
detected that some of them had a high correlation 
with respect to previous results while others had 
a very low correlation. The townships with higher 
opposition results with respect to 1998 tend to have 
a higher correlation than the others. This correla- 
tion will be called riggs, and the percentage of op- 
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position difference will be called A%^ 8 as defined 
in equation (5): 



(5) 



A%fg^ 8 



Yo Opposition in RR) 

- (% Opposition in 1998). 



In order to illustrate this, the results of two town- 
ships are plotted in Figure 9. In the "Olegario Vil- 
lalobos" township, the correlation with respect to 
the signatures and the 1998 percentage of opposition 
is large at r sf = 0.988 and riggs = 0.984 respectively. 
Additionally, notice that the average s is 0.639, so 
signers were the majority in this township. There- 
fore, the signatures are likely to have captured the 
political intentions of voters here. 

In the case of the "Vista al Sol" township, the 
average s is very low. Therefore, the uncertainty in 
the number of si votes with respect to the signatures 
could be large, as was shown in Section 5.2. In other 
words, the signatures are not likely to have captured 
the political intentions of the township accurately. 
This uncertainty is just not seen in the official re- 
sults, as the correlation of si votes with respect to 
the signatures is 0.990. Furthermore, the referendum 
results seem very distorted with respect to the 1998 



election, with a negative correlation of —0.667. In 
this township, the center with the most opposition 
in 1998 ended up being the most pro- government, 
and vice versa. 9 

The two townships shown in Figure 9 behave con- 
sistently with the hypothesis. "Olegario Villalobos" 
was able to increase its percentage of opposition be- 
cause many signatures were collected, whereas "Vis- 
ta al Sol" could not increase its percentage of oppo- 
sition because only a few signatures were collected. 
If this repeats itself in the rest of the country, then 
^1998 would be large when A%^ 8 is large, and riggs 
would be small when A%^ 8 is small. In an un- 
touched process, these two variables should be inde- 
pendent. 

In Figure 10, it is shown that, indeed in all of 
the country there is a strong relationship between 
A%^ 8 and riggg for computerized centers at the 
township, county and state levels. This relationship 
is much weaker — almost inexistant — for manual vot- 
ing centers. This finding is consistent with the hy- 
pothesis. 



9 This center returned to being the one with the most op- 
position 77 days later in the state governors election. 
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8. VARIABILITY IN VALUES OF k AND THE 
CORRELATION BETWEEN PERCENTAGE OF 
OPPOSITION AND VALUES OF s, FOR 
VARIOUS ELECTORAL EVENTS 

In Section 5.2, it was stated that as the value of 5 
decreases, the variability in k is expected to increase. 
According to equation (4) this variability must also 
be present in the relation between 5 and the percent- 
age of opposition. Therefore, as 5 becomes small, it 
should correlate poorly with the percentage of op- 
position. For this reason, when s is small, it should 
not determine the percentage of opposition. On the 
other hand, when s becomes large, it should corre- 
late better with the percentage of opposition. 

Let r s be the correlation of the percentage of op- 
position and 5, and let s be the median of all the 



values of 5 for computerized centers. For the subset 
of computerized centers with s < s this correlation 
will be called r S55 <s, and for the remaining centers 
where s > s the correlation will be called r s ^ s> s. 

The value of r s ^ s <s should be smaller than r s ^ s> s. 
These properties just defined are calculated for var- 
ious electoral events in Table 2. 

The exit poll shown in Table 2 was made under the 
supervision of Penn, Schoen and Berland Associates. 

The State Governors election took place just 
77 days after the Referendum. By counting votes for 
and against the pro-government candidate, a per- 
centage of opposition was calculated. During this 
election, the same voting machines were used, but 
there was an important difference: the paper ballots 
were manually counted for a randomly selected vot- 
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Table 2 

Correlation r s for computerized centers with s above and below s, for different electoral events 



Date 


Event 


Ts , s < s 


Ts , s > s 


Ts , s > s r s,s<3 


Dec 6, 1998 


Presidential election 


0.439 


0.685 


0.246 


Jul 30, 2000 


Presidential election 


0.607 


0.802 


0.195 


Aug 15, 2004 


Referendum official results 


0.845 


0.830 


-0.015 


Aug 15, 2004 


Exit polls 


0.325 


0.739 


0.414 


Oct 31, 2004 


States Governors election 


0.475 


0.707 


0.232 



ing machine in each and every voting center. The re- 
sults for the correlation r s for this election are shown 
in Table 2. 

From Table 2 it is clear that only the Referendum 
official results fail to exhibit a positive correlation 
difference. Also notice in Figure 11 that for the Ref- 
erendum official results, there is not a single voting 
center with a small s and large percentage of oppo- 
sition. The fact that only in the official Referendum 
results r SjS <5 is not smaller than r s ^ s> s is consistent 
with the hypothesis. 

9. HOT AUDIT 

In general, the paper ballots from the computer- 
ized centers were not manually counted. The CNE 
assured the Venezuelan citizens that the voting ma- 
chines had to accurately reflect the voters intention, 
because a sample of 192 machines (1% of them) 
would be randomly selected and audited the same 
day of the referendum. This is indeed a valid way 
of eliminating suspicion, as long as the selection is 
a truly random sample of all the voting machines. 

The day of the referendum, the CNE informed the 
public that because of logistical reasons, the sam- 
ple would be taken from a restricted universe of 20 
counties located in urban areas, leaving out of the 
audit more than 300 counties. With this decision, 
confidence in the results was adversely affected to 
say the least. 

The computerized voting centers inside and out- 
side of the 20 counties, to which the hot-audit uni- 
verse was reduced, are shown in Figure 12. It is clear 
that these 20 counties are not representative of all 
the computerized voting centers. See Appendix E 
for further details on this subject. 

Furthermore, out of 192 centers selected for hot 
audit, only 26 were actually audited in the pres- 
ence of witnesses representing the opposition and 
the international observers. The following excerpt 
from the Carter Center Comprehensive Report [4] 
is very illustrative: 



Auditors, table members, and military per- 
sonnel were not properly informed that 
the audit would occur nor were they clear 
about the procedure to be followed. The 
instructions themselves did not clearly call 
for a separate tally of the Yes and No 
votes, and in some centers, the auditors 
only counted the total number of voters. 

(...) 

Nevertheless, Carter Center observers were 
able to witness six auditing processes. In 
only one of the six auditing sites observed 
by The Carter Center did the paper bal- 
lot receipt counting actually occur. In this 
place, the auditing was conducted by the 
mesa president, and the recount of the 
ballots produced exactly the same result 
as the acta printed by the voting machine. 
In the rest of the sites observed, the audi- 
tor appointed by the CNE did not allow 
the opening of the ballot box, explaining 
his/her instructions did not include the 
counting of the Yes and No ballots from 
multiple machines. 

There were also complaints of military deny- 
ing access to voting centers where audits 
were being conducted. Carter Center ob- 
servers could not confirm this claim. (...) 
The CNE provided The Carter Center with 
copies of the audit reports of 25 centers. 
It was clear from the forms that the au- 
dit was not carried out in many places 
because the fields in the form were left 
empty, there were no signatures of pro- 
government or opposition witnesses, etc. 
The forms were poorly filled out, clearly 
showing inadequate training. The instruc- 
tions issued by the CNE to the auditors 
were either incomplete or unclear. This is 
a direct consequence of issuing the audit 
regulation three days before the election. 
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Fig. 11. Correlation between percentage of opposition and s for the lower two (s < s) and upper two quartiles (s > s) for 
computerized centers. The correlation for the lower two quartiles is expected to be smaller than the correlation in the upper 
two quartiles. This expected difference is not seen in the Referendum official results. 



The final result was that the CNE squan- 
dered a crucial opportunity to build con- 
fidence and trust in the electoral system 
and outcome of the recall referendum. 

Auditing only 26 centers out of 192 selected cen- 
ters, is basically a cancellation of the auditing pro- 
cess. But, is there anything special about these 26 
centers? If this drastic reduction in audit size was 
because it was "poorly executed," and poor execu- 
tion is independent of the value of 5, then the value 
of 5 of these 26 centers would behave as a random 
sample within the 5 value of the 192 selected cen- 
ters. 



From Figure 13, it is clear that the 26 centers that 
were actually audited seem to have a much higher 
value of s than the 192 centers from where they 
come from. The average s for the 192 selected cen- 
ters is 5 selected = 0.372 while for the audited ones it is 
Audited = 0.540. Additionally, the distribution of the 
192 selected centers is positively skewed while the 
distribution of the 26 audited centers is negatively 
skewed. 

Can this be just a coincidence? A Monte Carlo sim- 
ulation was done, selecting 26 random centers out of 
the 192 selected for auditing. The result of this sim- 
ulation is that the probability of having a 5 audited = 
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Fig. 12. Centers inside (a) and outside (b) of the 20 counties to where the hot-audit drawing was restricted. 



0.540 is 1 in 50 million; and this does not take into 
account the difference in skewness. 

This result is consistent with the hypothesis, be- 
cause centers with a small value of 5 are the ones 
most susceptible to distortions. 

Thus, it has been shown that the audited cen- 
ters are not representative of neither the universe of 



all computerized centers, nor the restricted universe 
used to select them. 

The audited centers are not representative of the 
universe of computerized voting centers (see Fig- 
ure 14) because: 

1. In the audited centers, the si vote won by 63.47% 
to 40.91%. 
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Fig. 13. Comparison between the s value of the 192 selected centers and the 26 audited centers. TOP: The selected centers 
are ordered according to the value of s and plotted. BOTTOM: Back-to-back stem- and-leaf plot showing the same values of s 
as in the top figure. To obtain s values, multiply stem by 0.1 and leaves by 0.01. 
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average s and A%^ 8 are indicated with lines. 



2. A%fgg 8 is very different. 

3. The value of 5 is much larger. 

Additionally, the townships, counties and states 
where centers were audited are not representative 
of the other townships, counties and states. They 
are not representative with regard to their A%^ 8 
and the correlation with respect to the 1998 election 
riggs . This can be seen in Figure 15. 

10. COLD AUDIT 

Given the fact that the hot audit failed to serve 
its purpose, another audit was made three (3) days 
after the referendum. This audit cannot validate the 
official results mainly because of two reasons: 

• The audited entity itself cannot select the centers 
to be audited. According to the OAS/Carter re- 
port [3] "The sample was generated by CNE staff" 
on its own computer using its own software. 

• The control mechanisms that had been implemen- 
ted to certify that the samples were unaltered 
were not used. 

The draw to select the centers to be audited was 
broadcast live on the official television station, but 
the results were not shown. Usually, the whole idea 
of transmitting a draw on TV, is to let the public 
know the results as they are being generated. 

When the ballot boxes were closed and sealed, and 
the electoral centers closed, the seal was signed by 
witnesses. The boxes were then taken into custody 
of the military. 

The following excerpt from the OAS/Carter Cen- 
ter report [3] explains the mechanism used to certify 



that the boxes were unaltered: 

Each box was physically checked to see 
whether: 

1. The material used to seal the box was 
intact or whether there were signs that 
it had been taken off and then replaced. 

2. There were cracks or holes through 
which votes might have been extracted 
or inserted. 

If a box was defective in regard to seal- 
ing, cracks, or holes, all the boxes of that 
polling station were excluded from the au- 
dit and a note to that effect recorded in 
the minutes. 

However, the witnesses who had signed the boxes 
were not called to certify the authenticity of the box. 

When this audit was questioned, the Carter Cen- 
ter and OAS response was that: 

Furthermore, the correlation between the 
signers and the si votes is almost identical 
in the universe and in the sample. The 
difference between the correlations is less 
than 1 percent: 



Correlation coefficient 


Universe 


0.988 


Sample 


0.989 



This certainly can be used to argue that the boxes 
opened were representative of the official results, but 
does not indicate anything in regard to validating 
the official results. 
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Townships, counties and states where the 26 audited centers are located. The vertical axis is the correlation with 
the percentage of opposition in 1998 (r\$$&). The horizontal axis is the difference in percentage of opposition with 
1998 (A%$$ 8 ). 



Interestingly, the draws for the hot and cold audit 
included 16 common centers. These 16 centers were 
successfully cold-audited, but none of them were al- 
lowed to be hot-audited. 

11. CONCLUSIONS 

We have explored the voting data arising from the 
RR carried out in 2004 and also the results of two 



audits conducted after the RR took place. We have 
identified several issues associated with the results 
obtained from voting centers using touch-screen vot- 
ing machines. In particular: 

1. The official si results in computerized centers seem 
to behave in an excessively linear fashion relative 
to the number of signatures in support of the RR 
in each voting center (see Section 5). 
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2. The official si results in computerized centers are 
surprising given the results of the 1998 elections 
in those same centers (see Section 7). 

3. The percentage of votes for the opposition seem 
to be too highly correlated with s, the relative num- 
ber of signatures in a voting center, in particular 
in those centers where s was small (see Section 8). 

When combined with the facts that in general, 
paper ballots were not counted and that voting ma- 
chines were connected to a central CNE server before 
voting results could be printed, these observations 
suggest that the official results obtained from com- 
puterized voting centers deserve a closer look. 

In principle, two audits — a hot audit carried out 
immediately following the referendum and a cold au- 
dit carried out three days later — should have helped 
resolve any questions arising about the voting and 
vote counting processes. However, an analysis of the 
data that resulted from the two audits reveals that 
the audits were not conducted as had originally been 
announced and thus could not alleviate doubts about 
the official results nor could they be used to certify 
the accuracy of results. In particular, we argue that: 

1. The computerized centers in the 20 counties to 
which the hot audit was restricted by the CNE 
were not representative of the universe of com- 
puterized voting centers (Figure 12). 

2. The hot-audited centers were not representative 
of the rest of the computerized centers (Figure 14). 

3. Townships, counties and states where computer- 
ized centers were hot-audited were not a repre- 
sentative sample of townships, counties and states 
in Venezuela (Figure 15). 

4. The probability that the centers that were hot- 
audited do not appear to be a random sample of 
all computerized voting centers seems to be high 
and thus it is difficult to believe that the unexpect- 
ed sample of audited centers was due to chance 
alone. Note that centers that were actually au- 
dited were drawn from a subsample of all centers 
with a high proportion of signatures (Figure 13). 

5. Audits were suspended in centers with low s, 
where the linearity in the official results is most 
questionable. 

While none of this constitutes proof of tampering, 
we believe that our analyses of some of the data 
collected in association with the recall referendum 
cast some doubt about the accuracy of the official 
results. If in fact it is reasonable to assume that: 

• A person who signed the form requesting a refer- 
endum was likely to vote si. 



• A person who did not sign the form is not neces- 
sarily likely to vote no, then the very high corre- 
lation between the proportion of signers and the 
proportion of si votes at a center should be viewed 
with suspicion rather than as a confirmation that 
official results are believable, as the OAS/Carter 
Center report claim. Indeed, an excerpt from the 
report states that: 

"There is a high correlation between the 
number of YES votes per voting center 
and the number of signers of the presi- 
dential recall request per voting center; 
the places where more signatures were 
collected also are the places where more 
YES votes were cast. There is no anoma- 
ly in the characteristics of the YES votes 
when compared to the presumed inten- 
tion of the signers to recall the president." 

We argue exactly the opposite and have provided 
persuasive arguments to support our position. 

APPENDIX A: DATA PROCESSING 
METHODOLOGY 

Official Referendum results were downloaded 
from the CNE website: http://www.cne.gob.ve/ 
ref erendum_presidencial2004/. 

The download was automated using a custom-made 
Perl script. All the data was stored on a MySQL data- 
base. Calculations were made using Mathematica 5.2 
which was connected to MySQL using the Database- 
Link package. Electoral results from the 1998 pres- 
idential election were obtained on an original CNE 
CD-ROM, and the data was converted from Mi- 
crosoft Access to MySQL. The REP from July 2004 
was also converted from MS Access to MySQL. The 
CNE signature data was obtained on a CD from 
Sumate, and is the same version given to the OAS 
and the Carter Center. This data was supplied in 
a single text file. 

By matching people's identification numbers (ce- 
dula number) from the signatures and REP data, 
it was possible to find the number of signatures per 
voting center. 

In order to classify voting centers into manual and 
computerized, the following sources of information 
were used: 

• Sumate's list of computerized and manual voting 
centers. 

• On the CNE website, computerized centers show 
results down to the voting machine level, whereas 
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Fig. 16. Partial aerial view of Miguel Pena township (taken from the Google Earth™ mapping service). Manual versus 
computerized voting centers are compared in regards to their k value and total number of votes (TV). The image is centered 
at Latitude 10°7'32.66"iV and Longitude 68°l / 22.48 // W. 



manual voting centers show results down to the 
voting table level. 

The list of computerized and manual centers ob- 
tained using the aforementioned sources was com- 
pared on a township by township basis with the 
CNE infrastructure document [5]. 

The list of centers effectively audited on the day 
of the Referendum was obtained from a document 
given by the Coordinadora Democrdtica to the Carter 
Center and OAS. A copy of this document and the 
data needed to reproduce this study can be found 
at: http://esdata.info/2004. 

The coordinates of the voting centers shown in 
Appendix B were provided by "Delta Electoral." 

The simulation was done using a deck of cards 
shuffling algorithm. The random number generator 
used by this algorithm was the "Wolfram rule 30 
cellular automaton generator for integers," which is 
provided by Mathematica. 

APPENDIX B: A MIXED TOWNSHIP 
EXAMPLE 

Miguel Peha is a township in Valencia County, in 
the state of Carabobo. It is one of the townships 
with higher population in the country. It had 32 
voting centers, 28 computerized and 4 manual. 



In Figure 16, a partial aerial view of this township 
is shown. In it, notice that manual and computerized 
voting centers are in the same urban neighborhood. 
Despite this, the values of k are much higher for the 
manual centers than for the surrounding computer- 
ized centers, regardless of the total number of votes. 

In Figure 17 notice that in this township, the low- 
est k value of the 4 manual centers is greater than 
the maximum k value of the 28 computerized voting 
centers. 

APPENDIX C: ADDITIONAL NONLINEARITY 
PLOTS 

According to the exit polls made under the su- 
pervision of Penn, Schoen and Berland Associates, 
the opposition won the Referendum by a wide mar- 
gin. By changing the numerator of equation (4) from 
percentage of si votes to percentage of si from exit 
polls, a value of fc ex j t po n s can be calculated. The 
result, for computerized centers only, is plotted on 
Figure 18. 

Similarly, fciggs can be calculated by using the per- 
centage of opposition in the 1998 presidential elec- 
tion in the numerator of equation (4). The result, for 
computerized centers only, is shown on Figure 19. 
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Fig. 17. Behavior of k versus total votes in all of Miguel Pena's voting centers. Manual and computerized centers are shown. 



APPENDIX D: MONTE CARLO 
SIMULATIONS FOR CORRELATION 
BETWEEN A%™ 98 AND r 199 8- 

Assuming that A%^ 8 and riggs are independent, 
regardless of being calculated at state, county or 
township level, then the correlation between them r* 
must be casual. In order to find the probability that 
the observed r* is casual, it is possible to reorder 
the values of nggs with respect to A%^ 8 . This re- 
ordering was made 100,000 times and the correlation 
was calculated each time. In all cases, the resulting 
distribution was found to be normal. The estimated 
probabilities for manual and computerized centers 
at state, county or township level are shown in Fig- 
ure 20. 



APPENDIX E: DIFFERENCES IN 
CHARACTERISTICS, OFFICIAL RESULTS 
AND REP VARIATION OF THE 20 COUNTIES 
SUBJECT TO HOT-AUDIT DRAWING IN 
COMPARISON TO THE OTHER COUNTIES 

When the CNE decided to restrict the audit to 20 
urban counties, it created two groups of computer- 
ized centers: 

• 2,040 computerized centers inside the 20 counties 
and therefore subject to be selected in the draw. 
Variables referring to these centers will use a 20 
as a subindex (•20)- 

• 2,553 computerized centers not subject to hot au- 
dit at all. Variables referring to these centers will 
use a as a subindex 




Fig. 18. Exit polls at computerized centers. 




Manual Centers 
p = 047 



States 



Computerized Centers 
p = 2a x itr 4 







— ^ 















-D-fi -0.4 -0.2 0.3 D.-4 O.fi 7\ 




-a.£ -i: -a -o.a 



p = J7 

a .op? 



Counties 



-o.6 -a.* -r-,: 




jj = 2.o x itr 7 



tie vial tons 



a. a 0,4 o.c 



I r-I.UIlihl.i ^ 


* 


cfcivlfttinnfj 

1 





P=012 



Townships 



2.8 x 10" l * 



0,6- '0.4 - 3 




1.2 standard 




-0.6 -0,4 -0.2 



Fig. 20. Comparison of official results correlation n versus expected value distribution found after 100,000 simulations for 
manual and computerized centers at state, county or township level. The simulation results follow a normal distribution, which 
is shown as a dotted line. The probability of the official r* happening by chance is indicated as p. 
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Fig. 21. Comparison of s probability density function (pdf) and cumulative density function (cmf) for computerized centers 
inside the 20 counties of the hot audit and in the 302 excluded counties. 



In Figure 12 it is shown that the behavior in com- 
puterized centers in the 20 counties is very different 
from that of the rest of the country. 

E.l Differences in Characteristics 

When the CNE set up the signature collection 
event, it established the number of signature col- 
lection centers (SCC) directly in proportion to the 
number of people in the electoral registry (REP) 
for each county. A lot of people live in urban coun- 
ties, therefore, a lot of SCCs were assigned to these 
counties. Thus, access from where the people lived 
to where they had to sign was much easier in these 
20 counties. On the other hand, voting centers are 
more numerous and better distributed throughout 
the national territory. 

For example, a county like Chacao in the Mi- 
randa state has 27 km 2 of area and 11 SCCs. In 
Chacao there were 24 voting centers, all of them 
computerized. On the other hand, the much larger 
Macanao Peninsula in Margarita Island has an area 
of 330.7 km 2 and only had 3 SCCs. There were 8 
voting centers in Macanao, all of them computer- 
ized. 

In Figure 21, it can clearly be seen that the 20 
counties have higher s values which is consistent 
with the ideas just explained. 

There were many computerized centers in rural 
areas where it was much more difficult to sign than 
to vote. When the audit universe was restricted to 
20 urban counties, all computerized centers in rural 
areas, the ones with a higher uncertainty in fc, were 
excluded from the hot-audit drawing universe. 



E.2 Differences in Results 

When the value of s decreases, in general, it is 
expected that the k values should increase, after 
all, k max = 1/s. Hence, a larger k is expected in 
rural counties than in the 20 counties of the hot 
audit where signing was less troublesome. However, 
in the official results, exactly the opposite occurred, 
as shown in Figure 22. 

Considering that for the official referendum re- 
sults /c20 is the average of 2,040 voting centers and k 
is the average of the remaining 2,553 voting cen- 
ters, how likely is it that just by chance, ^20 be 
larger than k by 3.4%? What could be expected 
is that ^20 would be smaller than k . Contrary to 
official results, in the exit polls and in the 1998 elec- 
tion ^20 is significantly less than fc , as shown in 
Figure 22. 

As seen in Figure 23, the distribution of k val- 
ues among the 2,040 auditable centers is quite dif- 
ferent from that of the 2,553 nonauditable centers. 
The k values in the 2,040 auditable centers tend to 
be larger than in the other 2,553 nonauditable cen- 
ters. The portion of centers with k smaller or near 
to 1, is much smaller in the 2,040 auditable centers 
than in the other 2,553. That is contrary to what 
happened in the 1998 election and in the exit poll. 
Additionally, note that the k pdf seems to be much 
more symmetric than that in the 1998 results or the 
exit polls. 

How likely is it that /C20 cmf be below k cmf 
with such a large difference (D = 0.233)? Being con- 
servative and assuming that both /C20 and k distri- 
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Fig. 22. Comparison of average k and s values for the computerized centers inside and outside the 20 counties to which the 
hot-audit universe was restricted. These k and s values are shown for the official referendum results, for the 1998 presidential 
election and for the referendum exit polls. 



( WipuN-viwI Onifi-s Ef'-lrn'rnhiiii Official Rroulta 




Computeri^d Centers Offic ial R^sntts in lyJHi EJectiou 




'Mi roimtira Lm tho hcjl mid it drawing 

•V)'2 nirtinUea ex hided from tlx? hot midil ■ I muring 



Fig. 23. Comparison of k probability density function (pdf) and cumulative density function (cmf) for computerized centers 
inside the 20 counties of the hot audit and in the 302 excluded counties. The maximum cmf difference (Supremum) for the 
official results is shown as D. 
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Fig. 24. REP variation versus A% 1998 in computerized centers inside and outside the 20 counties of the hot-audit drawing. 
A least-square line is included in both cases. 



butions came from the same continuous distribution, 
the probability can be estimated using the Kolmogo- 
rov-Smirnov Test for two samples. This probabil- 
ity was found to be in the order of 2.6 x 10 -54 . 
For the reasons previously exposed, the distribu- 
tion of k should be greater — not equal — than that 
of &2o- Hence, the actual probability should be much 
smaller. 



E.3 Electoral Registry (REP) Differences 

Between April and July 2004, 1,842,959 (14.9%) 
voters were added to the REP. In the computerized 
centers the number of registered voters went from 
10,849,321 to 12,390,159. In Figure 24 it is shown 
how differently these increments were distributed 
in the computerized centers. Furthermore, in Fig- 
ure 25, it can be seen that the 192 centers selected 




Fig. 25. REP variation versus A%5^ 8 in all computerized centers indicating the 192 selected for hot-auditing. None of the 
192 selected centers were in the rectangle area. 
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to be hot audited exclude an area where the govern- 
ment has important gains without a big increase in 
the REP. 
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